Goto

Collaborating Authors

 chatgpt text


ChatGPT-generated texts show authorship traits that identify them as non-human

arXiv.org Artificial Intelligence

Large Language Models can emulate different writing styles, ranging from composing poetry that appears indistinguishable from that of famous poets to using slan g that can convince people that they are chatting with a human online . While differences in style may not always be visible to the untrained eye, we can generally distinguish the writing of different people, like a linguistic fingerprint. This work examines whether a language model can also be linked to a specific fingerprint . Through stylometric and multidimensional register analys e s, w e compare human - authored and model - authored texts from different registers. We find that the model can successfully adapt its style depending on whether it is prompted to produce a Wikipedia entry vs. a college essay, but not in a way that makes it indistinguishable from human s . Concretely, the model shows more limited variation when producing outputs in different registers. O ur results suggest that the model prefers nouns to verbs, thus showing a distinct linguistic backbone from humans, who tend to anchor language in the highly grammaticalized dimensions of tense, aspect, and mood . It is possible that the more complex domains of grammar reflect a mode of thought unique to humans, thus acting as a litmus test for Artificial Intelligence. 2 Introduction Scholars from different disciplines have been addressing the question of what makes us human for centuries. For Nobel laureate Bertrand Russell, the answer is language, for "no matter how eloquently a dog may bark, he cannot tell you that his parents were poor but honest". H uman language is both flexible and constrained at the same time, and this is why the Turing Test, described as a litmus test for Artificial Intelligence [ Shieber 199 4, French 200 0], is linked to achieving a level of conversational proficiency that is highly complex, akin to that of a human [ Turing 1950 ] . Human language is flexible in the sense that we all make different choices when conversing. Every human is thought t o have a distinct linguistic fingerprint called idiolect [ Halliday et al. 196 4, Coulthard 2004 ] . This idiolect, which can be defined as an individual's unique use of linguistic forms (including lexical choices, collocations and fixed expressions, punctuation patterns, misspellings, and grammatical style), is critical for authorship attribution in a range of situations: from identifying that a poem with dashes, elliptical syntax, and unconventional capitalization is more likely authored by Emily Dickinson and not by William Shakespeare, to pinning down a person of interest in the course of a criminal investigation, as happened in the Unabomber case .


Politicians vs ChatGPT. A study of presuppositions in French and Italian political communication

arXiv.org Artificial Intelligence

This paper aims to provide a comparison between texts produced by French and Italian politicians on polarizing issues, such as immigration and the European Union, and their chatbot counterparts created with ChatGPT 3.5. In this study, we focus on implicit communication, in particular on presuppositions and their functions in discourse, which have been considered in the literature as a potential linguistic feature of manipulation. This study also aims to contribute to the emerging literature on the pragmatic competences of Large Language Models. Our results show that, on average, ChatGPT-generated texts contain more questionable presuppositions than the politicians' texts. Furthermore, most presuppositions in the former texts show a different distribution and different discourse functions compared to the latter. This may be due to several factors inherent in the ChatGPT architecture, such as a tendency to be verbose and repetitive in longer texts, as exemplified by the occurrence of political slogans mainly formed by change-of-state verbs as presupposition triggers (e.g., dobbiamo costruire il nostro futuro, 'we must build our future').


On the Generalization of Training-based ChatGPT Detection Methods

arXiv.org Artificial Intelligence

ChatGPT is one of the most popular language models which achieve amazing performance on various natural language tasks. Consequently, there is also an urgent need to detect the texts generated ChatGPT from human written. One of the extensively studied methods trains classification models to distinguish both. However, existing studies also demonstrate that the trained models may suffer from distribution shifts (during test), i.e., they are ineffective to predict the generated texts from unseen language tasks or topics. In this work, we aim to have a comprehensive investigation on these methods' generalization behaviors under distribution shift caused by a wide range of factors, including prompts, text lengths, topics, and language tasks. To achieve this goal, we first collect a new dataset with human and ChatGPT texts, and then we conduct extensive studies on the collected dataset. Our studies unveil insightful findings which provide guidance for developing future methodologies or data collection strategies for ChatGPT detection.